Statistical distance

In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two samples, two random variables, or two probability distributions, for example.

Metrics

A metric on a set X is a function (called the distance function or simply distance)

d : X × X → R

(where R is the set of real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:

d(x, y) ≥ 0 (non-negativity)
d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
d(x, y) = d(y, x) (symmetry)
d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).

Many statistical distances are not metrics, because they lack one or more properties of proper metrics. For example, pseudometrics can violate the "positive definiteness" (alternatively, "identity of indescernibles" property); quasimetrics can violate the symmetry property; and semimetrics can violate the triangle inequality. Some statistical distances are referred to as divergences.

Some important statistical distances include the following:

Other approaches

Signal-to-noise ratio distance
Mahalanobis distance
Distance correlation is a measure of dependence between two random variables, it is zero if and only if the random variables are independent.
The continuous ranked probability score is a measure how good forecasts that are expressed as probability distributions are in matching observed outcomes. Both the location and spread of the forecast distribution are taken into account in judging how close the distribution is the observed value: see probabilistic forecasting.
Lukaszyk–Karmowski metric is a function defining a distance between two random variables or two random vectors. It does not satisfy the identity of indiscernibles condition of the metric and is zero if and only if both its arguments are certain events described by Dirac delta density probability distribution functions.

Dodge, Y. (2003) Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9